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Abstract — A major hurdle in the development of soft and 
hard/soft data fusion systems is the inability to determine the 
practical performance gains between fusion operators without 
the burdens associated with human testing. Drift diffusion models 
of human responses (i.e., decision, confidence assessments, and 
response times) from cognitive psychology can be used to gain a 
sense of the performance of a fusion system during the design 
phase without the need for human testing. The majority of 
these models were developed for binary decision tasks, and 
furthermore, the few models which can operate on M-ary decision 
tasks are yet unable to generate subject confidence assessments. 
The current study proposes a method for realizing human 
responses over an M-ary decision task using pairwise successive 
comparisons of related binary decision tasks. We provide an 
example based on the two-stage dynamic signal detection models 
developed by Pleskac and Busemeyer (2010) where subjects were 
presented with a pair of lines on a computer screen, asked to 
determine which of two lines was the longest, and to assess 
their confidence in their decision using a subjective probability 
scale. M-ary human opinions were simulated for this line length 
task and used to assess the performance of several fusion rules, 
namely: Bayes’ rule of probability combination, Dempster’s Rule 
of Combination (DRC), Yager’s rule, Dubois and Prade’s rule 
(DPR), and the Proportional Conflict Redistribution rule #5. 
When taking source reliability into account in the combination, 
Bayes’ rule of probability combination and DRC exhibited the 
most accurate performance (i.e., the largest amount of specific 
evidence committed towards the true outcome) for this task. 
Yager’s rule and DPR exhibited inferior performance across all 
simulated cases. 

Index Terms — Data fusion, Dempster- Shafer Theory, Human 
Simulation, Expert reasoning systems, Belief fusion 

I. Introduction 

The use of human opinions in data fusion systems is a 
current topic of interest. Human-generated data, often cate- 
gorized as “soft data,” may provide a level of insight and 
intuition that is not always captured by electronic, optical, 
mechanical, or other “hard” sensors. However, it is not easy 
to develop a statistical characterization for human decision 
makers [1]. Methods for determining the performance of data 
fusion systems involving inputs from humans rely mostly on 
the use of examples/counterexamples (e.g., [2], [3]) or on 
predetermined data sets that were developed through direct 
human testing (e.g., [4]-[6]). Models of human decision- 
making from cognitive psychology present an opportunity to 
simulate the performance of soft and hard/soft fusion systems 
flexibly and accurately without many of the burdens associated 
with human testing. 



The majority of studies which have employed models of 
human decision makers looked at how task reward structures 
influence human decision-making strategies when the human 
acts as a director of information (e.g., when humans make 
choices in response to evolving system performance metrics, 
as in [7]). Much less work has been done on using human de- 
cision making models for assessing the performance of fusion 
systems in which the human acts as a source of information 
(e.g., when humans make choices regarding the state of a 
certain phenomenon and assess their level of confidence in 
these choices, as in [6], [8]). 

Drift diffusion models [9] of human responses have been 
proposed in cognitive psychology as a means of accurately 
capturing the dynamics and relationships present between hu- 
man decision-making and response time on both binary (e.g., 
[10]) and M-ary (e.g., [11]) decision problems. Little work has 
been done regarding the incorporation of human confidence 
assessments in such drift diffusion models. Furthermore, the 
majority of effort in this area has been focused on binary 
decision problems [12]. We have previously shown how it is 
possible to assess fusion performance using models of binary 
human responses in [13], [14]. The current study proposes 
a method for extending drift diffusion models of human 
decision making, confidence assessment, and response time to 
related multihypothesis (M-ary) decision tasks. Specifically, we 
make use of the two-stage dynamic signal detection (2DSD) 
model of [12] to produce subjective probabilities on a M-ary 
decision task using pairwise successive comparisons of binary 
decision tasks. As a motivating example, we use the 2DSD 
human parameters estimated in [12] relating to a line length 
discrimination task , in which the authors positioned subjects 
in front of a computer monitor, presented the subjects with 
two lines at a time, and asked them to provide a decision 
and confidence rating on which line was longer. We apply a 
successive pairwise comparison algorithm to the binary line 
length discrimination task to simulate human responses on 
an M-ary line length discrimination task (i.e., subjects are 
instructed to choose and assess decision confidence for the 
longest line amongst M lines). The subjects from the line 
length discrimination task in [12] and our successive pairwise 
comparison technique are used to assess the accuracy and 
precision of combining human responses using Bayes’ rule 
of probability combination (i.e., Bayes’ Theorem), Dempster’s 
Rule of Combination (DRC), Yager’s Rule, Dubois and Prade’s 
rule (DPR), and the Proportional Conflict Redistribution Rule 




#5 (PCR5) under varying numbers of decision alternatives (i.e., 
sets of lines differing in length). 

The remainder of this work is organized as follows. Sec- 
tion II describes the 2DSD human response model employed 
here as it relates to the line length discrimination task example 
of [12]. Section III describes the formulation of the M-ary ex- 
tension methodology using pairwise successive comparisons. 
Section IV describes an M-ary fusion simulation for the line 
length discrimination task using Bayes’ rule of probability 
combination, DRC, Yager’s Rule, DPR, and the PCR5. Each 
fusion operator was used to combine belief mass assignments 
generated using the M-ary extension methodology and 2DSD 
models provided in [12]. The performance of each operator 
was determined by calculating the average nearness of the 
combined BMAs to a BMA which assigns the true outcome 
full belief. The results of the simulation are described in 
Section V. After combination of twelve or more sources, 
Bayes’ rule of probability combination and DRC were found 
to be the most accurate when statistical evidence relating to 
the subject’s ability to make accurate confidence assessments 
was available. PCR5 was found to be at least as accurate as the 
best decision-maker in the combination across all fusion cases. 
Yager’s rule and Dubois and Prade’s rule exhibited inferior 
performance. 

II. Human Simulation Methodology 
A. Two-Stage Dynamic Signal Detection [12] 

Two-stage dynamic signal detection (2DSD) is a recently 
developed model that accounts for a wide range of phenomena 
in human decision making, while also taking into account the 
modeling of confidence assessments [12]. Let A = {A, A}, 
where A represents a binary decision task consisting of 
the alternatives A and A. In 2DSD, the internal evidence 
accumulated in favor of the alternative A over A at time t 
(i.e., L(t)) is given by the stochastic difference equation 

AL(t) = 5 At + V~At e(t + At), L(0) = To, (1) 

where S is known as the drift rate and e(t) is a simulated white 
noise process with zero mean and variance a 2 . The value a is 
known as the drift coefficient. The drift rate S is either positive 
or negative, depending on whether A or A is true. To account 
for trial variability, the drift rate S and the initial condition L 0 
can be chosen on a per trial (or per simulation) basis via S ~ 
N(is,r] 2 ) (normally distributed) and L 0 ^ U(—0.bs z ,0.bs z ) 
(uniformly distributed); here v and r] are the subject mean 
drift rate and drift rate standard deviation respectively, and 
s z G [0, oc) is the size of an interval containing the initial 
condition Lq. The evidence accumulation is simulated until a 
threshold, either 6 a , 6^, is crossed (where — % < L 0 < 6 A ). 
A decision a G A is determined such that 

r A L(t) > 0 A 

a= l A L(t ) < -0a • (2) 

I wait otherwise 



Let P( a ) = [p[ a ^ • • -Pk?] denote the K a possible confidence 
values associated with choosing a G A at time td . The 
assigned confidence level p G associated with deciding 
a after waiting t c = td + r is given as 

p = p- a) when L(t c ) G c\ a \ (3) 

where c^ = — oo and = oo for each a G A. The value r 
is known as the interjudgment time. The remaining confidence 
bin parameters C( a ) = [cf^ • • -c^_J are chosen such that 
Ci - 1 < Ci for each i G {1, . . . , K a — 1} and each a G A. 

In summary, a 2DSD realization produces the subjective 
probability assignment 

Pa{°) = Pi (4) 

Pa(a)^ 1 — p. (5) 

The authors in [12] suggest the following additional param- 
eter restrictions to simplify the 2DSD implementation. 

• The decision thresholds for both alternatives can be 
chosen symmetrically (i.e., 6^ = 6a = 6). 

• The confidence bins for both choices can be set equal 
(i.e., C ( A ) = C W = C). 

• The confidence values can befixed for all subjects and all 
alternatives (e.g., P^ = P^ = [0.50,0.60, • • • , 1.00]). 

• The drift coefficient can be fixed for each subject (e.g., 

Cr = 0.1). 

Applying these simplifications results in the 10-tuple, 

S = {v, rp s z , 6, r , Ci, c 2 , c 3 , c 4 , c 5 }. (6) 

for each subject. The authors of [12] suggest using the quantile 
maximum probability method [15] to estimate S using statistics 
relating to subject decisions, confidence assessments, and 
response times. 

B. Binary Line Length Task Overview [12] 

In the line length discrimination task modeled in [12], 
subjects were shown a 32.00 millimeter long line paired with 
either a 32.27, 32.59, 33.23, 33.87, or a 34.51 millimeter 
long line. Lor each given line pairs, the subjects were asked 
to identify which of the two lines was longer, and assess 
their decision confidence using the subjective probability scale 
{0.50, 0.60, •• • ,1.00}. The time step of the simulator was 
fixed in [12] at At = 0.001 for each subject. Live different 
mean drift rates, v\ though z/ 5 , were found for each subject 
relating to the side-by-side comparison of the 32 millimeter 
long line compared with the 32.27, 32.59, 33.23, 33.87, and 
34.51 millimeter long lines respectively. The parameter values 
used to simulate each subject can be found in [12, Tables 3 and 
6]. Also, in [12, Table 6] separate decision thresholds 6 were 
determined for two cases of the line length discrimination task, 
namely when subjects were asked to focus on fast responses 
and when subjects were asked to focus on accurate responses. 
In the present study, the values of 6 which represent the 
subjects focusing on accurate responses were used. 




C. Out-of-Sample Prediction 

For the line length discrimination task of [12], the five 
different mean drift rates, v\ though z/ 5 relate to five specific 
tasks comparing a 32.00 millimeter long line with either a 
32.27, 32.59, 33.23, 33.87, or a 34.51. We perform linear 
interpolation to estimate mean drift rates for line comparisons 
not investigated in [12]. Let A l represent the length difference 
between each line, such that 



A l = l R -l L , (7) 

where Ir and l l represent the lengths of the right and left lines 
as presented to the subjects. Linear regression was applied to 
the coordinate pairs (A Z, v) for each subject of the line length 
discrimination task in [12] (Figure 1). All subject drifts rates 
appear to follow a linear relationship. With this relationship 
in mind, the human parameter sets in equation (6) can be 
rewritten with v = z/ m AZ, such that 

S(Al) = {u m Al, r], s z , 0 , t , ci, c 2 , c 3 , c 4 , c 5 }. (8) 

Here v m is the slope of the linear fit as given for each subject 
in Figure 1. 



III. M-ary Extension Methodology 



A. Successive Pairwise Comparison Aggregation 

Using the out-of-sample prediction method described by 
the parameter set S(Al) in (8), we can formulate an M- 
ary human simulator which determines the longest line using 
successive pairwise comparisons. Let the lengths of M lines 
be given by L m = [hi • • • , h, • • • , Im\, and let represent 
the event that the i th line is deemed the longest. We denote 
Qm = {tui , . . . ,oy . . . ,o;m} and N m = {1,2 , 

Given a group of N human sources, let P^ (u^) represent the 
subjective probability function which describes the n th source’s 
confidence towards each c Oi G where n G Nat- Assuming 
that the line lengths described by Lm are distinct, a unique 
maximum h* must exist in L m- The subjective probabilities 
P^\(jJi) can be represented as 

P&l (<*) =p£2 (( h = h * )\h* € L m ) • (9) 

Expanding the conditional probability yields 

r n u W )- ( n) (J j ) 

e Lm ) 

= n(u“ih = h-)) 



Since we assumed the existence of a unique maximum length 
k*, we have that (h = h*) D (/• = k*) = 0 for all y G N m 
where i ^ i. Hence (9) can thus be reduced to 



p&^i) = 



P^ (h = *<•) 



(ii) 



The event (h = h *) can be thought of as every other line lj 
having shorter length then /*, where j i. Hence, 

{k=k*)= p| (12) 

je n m 



Combining (11) and (12), and assuming that belief in (h > lj) 
is independent for all i, j G Nm yields 



p( n ) 

CLm 



(Wj) = 



rw p$\ii > ij) 

j/* 



Ei=i rijeilw P 

3 pi 






(13) 



where A = {(k > lj ), (k < lj)}. The subjective probabilities 
> h) ^ or an Y L j ^ Nm can be realized using the 
2DSD human tuples of [12], and applying the linear fits for 
the mean drift rates as shown in (8). Suppose that a G A and 
p G [0, 1] are, respectively, the decision and confidence values 
associated with a realization of the n th subject using the 2DSD 
algorithm. The probability assignment for the event (^ > lj) 
for any i, j G Nm is 



P^hh > lj) = 



p 

1 — p 



a — (ji lj ) , 

cl — (ji lj ) . 



(14) 



After realizing p { y\h > lj) for every i,j G Nm, (13) 
can be used to create the belief probabilities associated with 
each line length in Lm being the longest (i.e., P^ (coi)). The 
longest line can be determined by choosing the uji with highest 
belief, that is 

Wi* = arg max , (15) 



with a corresponding confidence value of pi* = P^ (cci*). 
Since 2DSD models choose from a finite set of confidence 
values [12], the following three cases can occur: ay is 
unique, ay is not unique, or ay does not exist because the 
denominator of (13) is zero. In the second case, a decision can 
be made by choosing one of the ay at random (i.e., assuming 
all are equally likely). In the third case, a decision cannot be 
reached and a “no decision” state is returned. 



B. Assessing Subject Performance 

Similar to equations (4) and (5), the 2DSD-based M-ary line 
length discrimination task simulator yields a single decision 
amongst M alternatives (denoted ay), and a corresponding 
decision confidence (denoted p^). Writing this decision and 
confidence pair as a subjective probability assignment yields 

(16) 

<(^*)= 1-Pi- (17) 

for a given subject n G Njv, where ay C 9 m represents the 
negation of ay. Let cj* G 9m represent the true outcome 
of 9 m. Ideally, subject n should assign full belief (i.e., 
probability one) to the correct outcome a;*. As subject n 
assigns less belief to a;*, the quality of this person’s opinion 







(a) Subject 1 



(b) Subject 2 



(c) Subject 3 






(d) Subject 4 



(e) Subject 5 



(f) Subject 6 



Fig. 1. Linear fits of subject mean drift rates versus line length differences for the line length discrimination task as presented in [12]. Equations and R 2 
values shown for each subject. 



decreases. Motivated by [12], we denote this idea of subject 
opinion quality as evidence strength , £, where 



£(wi-,Pi. K*) 



1 - (1 -p ,.) 2 Wj* =LJ* 
1 — (Pi*) 2 w i* 7^ 



(18) 



Evidence strength is derived from the quadratic scoring rule 
known as Brier score [16]. An evidence strength value of one 
means that the subject has chosen the correct outcome and 
assigned it probability one. An evidence strength value of zero 
means that the subject has chosen the incorrect outcome, and 
has assigned it probability one. 



C. Simulated Performance of Subjects 

All six subject tuples from [12, Tables 3 and 6] were 
simulated using the human tuples described by (8) and the 
mean drift rate regressions of Figure 1. The line lengths 
presented to the simulated subjects were L m = {32,32 + 
d, 32 + 2d, . . . , 32 + (M — 1 )d} where M was the number 
of lines being compared and d was the incremental length 
difference between lines, in millimeters. Subject decisions and 
confidence assessments were generated using the successive 
pairwise comparison aggregation method of Section III-A. 
The evidence strengths of each subject were determined and 
averaged over 10,000 trials from d = 0.01 to d = 1.0 in 
increments of 0.01 and for M = 2, 4, 6, and 8. For each 
subject, trials which produced the “no decision” state were 
repeated until a decision and confidence value were reached. 



Figure 2 shows the average evidence strength, £, of each 
subject versus the incremental line length difference, d. Evi- 
dence strengths for different numbers of alternatives M for 
each subject are also shown. As expected, increasing the 
perceptual difficulty of the task (i.e., decreasing d) decreased 
subject performance. For large enough d (e.g., d > 0.60), 
increasing the number of alternatives was found to have little 
effect on subject performance. For smaller d (e.g., d < 0.40), 
increasing the number of alternatives caused the largest de- 
crease in performance when going from M = 2 to M = 4 
alternatives. For M > 4 however, the subject performance 
was similar to the M = 4 case. This outcome seems logical, 
as increasing the number of alternatives without changing the 
task difficulty will result in some alternatives being easier to 
rule out than others (e.g., the shortest lines will be more easily 
discernible). 

D. Assumptions and Limitations 

Our method assumes that the subjects perform pairwise 
successive comparisons on every single possible pair of al- 
ternatives amongst a larger set of alternatives. For the line 
length discrimination task, any two lines in a set of lines 
which are clearly different in length would result in larger 
mean drift rate values, which according to 2DSD will produce 
exponentially faster response times [12]. According to our 
extension methodology, simulated subjects will spend less time 
deliberating between pairs of lines which are clearly different 
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Fig. 2. Simulated averages of evidence strengths, £, for all six 2DSD subject models from [12, Tables 3 and 6] under the 2DSD M-ary human response 
simulator for the line length discrimination task versus the incremental line length difference, d. Average evidence strengths shown for M = 2, 4, 6, 8 
alternatives. Averages obtained over 10,000 trials of the M-ary simulation algorithm for each subject, while repeating trials which produced the no decision 
state for each subject. 



in length. In reality, human subjects may not even make use 
of a pairwise comparison technique for lines which are clearly 
different in length. The methodology also assumes that a linear 
relationship for the out-of-sample prediction best describes 
how changing observations (i.e., line lengths) influences the 
mean drift rate parameter. Although the information in Fig- 
ure 1 seems to support this notion, a linear relationship for 
the out-of-sample prediction may become less accurate as the 
line length difference approaches zero and the subjects reach 
their perceptual limit. 

IV. Fusion Algorithm Simulation 

We used the proposed M-ary human response simulation 
method to assess the combination of human responses using 
several belief fusion operators. They are Bayes’ rule of prob- 
ability combination [17]; Dempster’s Rule of Combination 
(DRC) [18]; Yager’s Rule [19]; Dubois and Prade’s rule (DPR) 
[20]; and the Proportional Conflict Redistribution Rule #5 [21]. 
The literature provides an abundance of information on the 
implementation of various Dempster-Shafer theory concepts 
(e.g., [18], [21], [22]). For the sake of brevity, we will focus 
here on explaining only those concepts which are pertinent to 
the fusion simulation examples presented in this study. 

A. Fusion Operator Inputs 

For each subject n G Njv, let the simulated decision and 
confidence values be given as c G Qm and p^ G [0, 1]. 



Excluding Bayes’ rule of probability combination, the belief 
fusion operators investigated here use inputs known as belief 
mass assignments (BMA). BMAs can be thought of as assess- 
ing evidence on the powerset of alternatives, allowing the user 
to specify evidence imprecisely (i.e., evidence towards a dis- 
junction of alternatives rather than the alternatives themselves) 
[18]. In the current study, the BMAs m n (X) were formulated 
for each subject such that 

v = 4 n > 

x = n M ( 19 ) 

otherwise 



i {n \x) = <( 



(n) 

Pi* 

1 
0 



(n) 

'Pi* 



for all BMAs. Fusion using Bayes’ rule of probability com- 
bination was performed on the subjective probability assign- 
ments defined for each w G 9 m as 



P (n) (w) = 



» 



(n) 

C 0 = UJL 



( M- 



1 1 1 (n)\ _L ( n ) ’ 



( 20 ) 



A vacuous BMA [18] or an equiprobable subjective probability 
assignment was used whenever the simulated subjects returned 
the “no decision” state. For each fusion method evaluated, the 
source combination order was chosen by sampling each of the 
thirty- six sources with equal probability. 

(fi) 

We defined evidence strength £ in (18). Let £ be the 
average evidence strength of the n th source. For BMAs, 



it is possible to account for source reliability through the 
discounting operation [22] 

f n) m(X) 

t n) m(x) + (i-t n) ) x = n M ' 

( 21 ) 

We define the discounting operation analogously for subjective 
probabilities in [13] as 

P {n \^f n) ) = t n) P (n) M + |fiAf| _1 (l - t n) ), (22) 

where \Qm\ is the cardinality of Dm- 

B. Fusion Operator Performance Metrics 

The combination methods we study here can all be thought 
of as producing a class of subjective probability assignments 
[23] on each of the c o G Dm defined by the belief and 
plausibility [18] ranges [Bel(cc), Pl(cc)], where 

Bel(V) = m ( Z ) Bel(w) = m(w), (23) 

zcu 

z<zx 

and 

P1(X) = Y, m ( Z ) =► P1 M = I] rn(Z). (24) 

zcn zcn 

znx/0 coez 

In the case of Bayes’ rule of probability combination, a 
single subjective probability assignment is produced. Similar 
to the performance of the subjects in Section III-B, we take 
the performance of the fusion operators as a measure of 
the nearness of the subjective probability assignments to one 
which assigns the truth cc* G Dm probability one. The result is 
a class of evidence strengths defined by the intervals [^Beh £pi], 
where 

&ei=£(w*,Bel(a/)|a/), (25) 

and 

= (26) 

The lower envelope £eei can be thought of as a measure of 
the accuracy of the combination operator, and the size of the 
interval (£pi — ^Bei) can be thought of as the precision of the 
combination operator. Accurate belief combination operators 
will tend to assign probability one to the correct outcome, 
resulting in values of ^Bei close to one. Precise belief combi- 
nation operators will tend to produce more specific evidence, 
resulting in values of (£pi — ^Bei) being close to zero, and 
hence 1 — (£pi — ^Bei) would be close to one. Since Bel(cc*) < 
Pl(cc*) < 1 [18], it follows that ^Bei < £pi < 1. Hence systems 
with high accuracy (i.e., ^Bei close to one) will also be very 
precise (i.e., (£pi — £fiei) close to zero). In the Bayesian case, 
(6>i - &ei) = 0 since Bel(cj*) = Pl(w*) = P(cj*) [18]. 



C. Simulation Overview 

The M-ary human response simulator of Section III was 
used to simulate decisions and confidence values over 10,000 
trials using six responses from each subject in [12, Tables 3 
and 6] under the line length discrimination task (i.e., thirty six 
total sources). Subjects were simulated using the line length 
differences L m = {32, 32 + d, 32 + 2d, . . . , 32 + (M — 1 )d}, 
using an incremental line length difference d = 0.20 mm and 
M = 2, M = 4, and M = 8 alternatives. The performance 
metrics ^Bei and 1 — (£pi — ^Bei) (i- e -> accuracy and precision) 
of each fusion method were determined and averaged over the 
10,000 trials of the simulation. 

V. Results 

Figure 3 shows the accuracy performance (i.e., ^Bei) for each 
of the five fusion methods mentioned in Section IV versus 
the number of sources present in combination. The accuracy 
performance (i.e., evidence strengths) of the best and worst 
subjects are shown for comparison. Similarly, Figure 4 shows 
the precision performance (i.e., 1 — (£pi — ^Bei)) for each of the 
five fusion methods mentioned in Section IV. The subplots of 
both Figure 3 and Figure 4 show the number of alternatives 
M simulated and the results obtained from performing or not 
performing the evidence strength discounting of equations (21) 
and (22). Higher values of ^Bei and 1 — (£pi — ^Bei) indicate 
higher combination accuracy and precision respectively. 

When no source discounting is performed, Bayes’ rule of 
probability combination and DRC could not be used. The 
reason was that the chances of any two simulated subjects 
presenting totally conflicting evidence was non-negligible 1 . 
With this situation in mind, we make note of the following 
observations. 

• When source discounting was performed using aver- 
age source evidence strength, Bayes’ rule of probability 
combination and DRC exhibited similar accuracy perfor- 
mance (Figures 3d-3f). 

• The number of alternatives was observed to have a 
stronger impact on the accuracy performance when 
source discounting was not performed (Figure 3). Similar 
to the subject performance results (Figure 2), the largest 
decrease in accuracy performance occurred when going 
from 2 to 4 alternatives. The decrease was smaller when 
going from 4 to 8 alternatives. 

• When source discounting was performed, similar per- 
formance was observed by PCR5, Bayes’ rule of prob- 
ability combination and DRC, as long as there were 
twelve or less human responses in the combination. When 
we included more than twelve human responses in the 
combination, Bayes’ rule of probability combination and 
DRC exhibited higher accuracy performance than PCR5 
(Figures 3d-3f). 

• When source discounting was performed, PCR5 and DRC 
precision increased as the number of sources present in 

totally conflicting evidence results in a division by zero in the equations 
for Bayes’ rule of probability combination and for DRC. 
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(a) 2 lines, No evidence strength discounting 





(b) 4 lines, No evidence strength discounting (c) 8 lines, No evidence strength discounting 






(d) 2 lines, Evidence strength discounting (e) 4 lines, Evidence strength discounting (f) 8 lines, Evidence strength discounting 



Fig. 3. Average accuracy performance (i.e., feel) for each of the fusion methods mentioned in Section IV versus the number of sources present in combination 
(higher is better). The evidence strengths for the best and worst subjects in the combination are also shown for comparison. 



the combination increased. Eventually the precision con- 
verged to one for both PCR5 and DRC. This convergence 
was observed to occur more quickly with PCR5 than with 
DRC. Additionally, it was observed that increasing the 
number of alternatives decreased the rate of convergence 
for both PCR5 and DRC (Figures 4d-4f). 

• When source discounting was not performed, the preci- 
sion performance of PCR5 was found to be the same 
regardless of the number of line length task alternatives 
(Figures 4a-4c). 

• Yager’s rule and Dubois and Prade’s rule exhibited infe- 
rior accuracy performance in all cases (Figure 3). Both 
rules exhibited accuracy performance that was worse than 
the worst single source present in the combination. Fur- 
thermore, Yager’s rule and Dubois and Prade’s rule also 
exhibited the lowest precision performance (Figure 4). 

VI. Conclusions 

We have shown how the 2DSD human simulator of [12] 
can be applied to determine the average performance (i.e., 
accuracy and precision) of fusion operators which use human 
opinions on M-ary decision problems. We make use of con- 
fidence assessment aggregation through successive pairwise 
comparisons. The same approach can be used with other 
human-decision models that provide decisions along with 
assessments of confidence in these decisions. Here, we used 
an M-ary line length task simulator as an example to evaluate 



the accuracy and precision of Bayes’ rule of probability 
combination, Dempster’s Rule of Combination (DRC), Yager’s 
rule, Dubois and Prade’s rule, and the Proportional Conflict 
Redistribution Rule #5 (Figures 3 and 4). It was observed that 
the accuracy of Bayes’ rule of probability combination and 
DRC was minimally affected when incorporating subjective 
data through confidence assessments. After combination of ten 
to fifteen sources, Bayes’ rule of probability combination and 
DRC were found to exhibit the highest accuracy performance 
when source discounting was performed. PCR5 was found 
to exhibit accuracy performance at least as good as the best 
source in the combination across all fusion cases. Yager’s rule 
and Dubois and Prade’s rule were found to exhibit inferior 
performance, as they exhibited worst accuracy and precision 
values. 
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